Overview

Dataset statistics

Number of variables14
Number of observations800
Missing cells386
Missing cells (%)3.4%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory87.6 KiB
Average record size in memory112.2 B

Variable types

NUM9
CAT3
BOOL2

Reproduction

Analysis started2020-06-02 21:21:28.719419
Analysis finished2020-06-02 21:21:50.185813
Duration21.47 seconds
Versionpandas-profiling v2.8.0
Command linepandas_profiling --config_file config.yaml [YOUR_FILE.csv]
Download configurationconfig.yaml

Warnings

Gen is highly correlated with #High correlation
# is highly correlated with GenHigh correlation
Type 2 has 386 (48.3%) missing values Missing
# is uniformly distributed Uniform
Name has unique values Unique

Variables

#
Real number (ℝ≥0)

HIGH CORRELATION
UNIFORM

Distinct count721
Unique (%)90.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean362.81375
Minimum1
Maximum721
Zeros0
Zeros (%)0.0%
Memory size6.2 KiB

Quantile statistics

Minimum1
5-th percentile34.95
Q1184.75
median364.5
Q3539.25
95-th percentile689.05
Maximum721
Range720
Interquartile range (IQR)354.5

Descriptive statistics

Standard deviation208.3437976
Coefficient of variation (CV)0.574244492
Kurtosis-1.165705095
Mean362.81375
Median Absolute Deviation (MAD)177.5
Skewness-0.001122502762
Sum290251
Variance43407.13798
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
47960.8%
 
38640.5%
 
71140.5%
 
71040.5%
 
15030.4%
 
630.4%
 
41330.4%
 
64630.4%
 
30320.2%
 
30220.2%
 
Other values (711)76695.8%
 
ValueCountFrequency (%) 
110.1%
 
210.1%
 
320.2%
 
410.1%
 
510.1%
 
ValueCountFrequency (%) 
72110.1%
 
72020.2%
 
71920.2%
 
71810.1%
 
71710.1%
 

Name
Categorical

UNIQUE

Distinct count800
Unique (%)100.0%
Missing0
Missing (%)0.0%
Memory size6.2 KiB
Nidorino
 
1
Absol
 
1
ShayminSky Forme
 
1
Sandile
 
1
DeoxysNormal Forme
 
1
Other values (795)
795
ValueCountFrequency (%) 
Nidorino10.1%
 
Absol10.1%
 
ShayminSky Forme10.1%
 
Sandile10.1%
 
DeoxysNormal Forme10.1%
 
Registeel10.1%
 
Drowzee10.1%
 
Typhlosion10.1%
 
Krookodile10.1%
 
Lickitung10.1%
 
Other values (790)79098.8%
 

Length

Max length25
Median length8
Mean length8.84125
Min length3

Type 1
Categorical

Distinct count18
Unique (%)2.2%
Missing0
Missing (%)0.0%
Memory size6.2 KiB
Water
112
Normal
98
Grass
 
70
Bug
 
69
Psychic
 
57
Other values (13)
394
ValueCountFrequency (%) 
Water11214.0%
 
Normal9812.2%
 
Grass708.8%
 
Bug698.6%
 
Psychic577.1%
 
Fire526.5%
 
Electric445.5%
 
Rock445.5%
 
Ground324.0%
 
Ghost324.0%
 
Other values (8)19023.8%
 

Length

Max length8
Median length5
Mean length5.26
Min length3

Type 2
Categorical

MISSING

Distinct count18
Unique (%)4.3%
Missing386
Missing (%)48.3%
Memory size6.2 KiB
Flying
97
Ground
 
35
Poison
 
34
Psychic
 
33
Fighting
 
26
Other values (13)
189
ValueCountFrequency (%) 
Flying9712.1%
 
Ground354.4%
 
Poison344.2%
 
Psychic334.1%
 
Fighting263.2%
 
Grass253.1%
 
Fairy232.9%
 
Steel222.8%
 
Dark202.5%
 
Dragon182.2%
 
Other values (8)8110.1%
 
(Missing)38648.2%
 

Length

Max length8
Median length3
Mean length4.3725
Min length3

Total
Real number (ℝ≥0)

Distinct count200
Unique (%)25.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean435.1025
Minimum180
Maximum780
Zeros0
Zeros (%)0.0%
Memory size6.2 KiB

Quantile statistics

Minimum180
5-th percentile250
Q1330
median450
Q3515
95-th percentile630
Maximum780
Range600
Interquartile range (IQR)185

Descriptive statistics

Standard deviation119.9630398
Coefficient of variation (CV)0.2757121362
Kurtosis-0.5074607103
Mean435.1025
Median Absolute Deviation (MAD)85
Skewness0.1525299234
Sum348082
Variance14391.13091
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
600374.6%
 
405263.2%
 
500232.9%
 
580232.9%
 
300192.4%
 
490182.2%
 
525162.0%
 
480151.9%
 
495151.9%
 
330151.9%
 
Other values (190)59374.1%
 
ValueCountFrequency (%) 
18010.1%
 
19010.1%
 
19410.1%
 
19530.4%
 
19810.1%
 
ValueCountFrequency (%) 
78030.4%
 
77020.2%
 
72010.1%
 
70091.1%
 
680131.6%
 

HP
Real number (ℝ≥0)

Distinct count94
Unique (%)11.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean69.25875
Minimum1
Maximum255
Zeros0
Zeros (%)0.0%
Memory size6.2 KiB

Quantile statistics

Minimum1
5-th percentile35.95
Q150
median65
Q380
95-th percentile110
Maximum255
Range254
Interquartile range (IQR)30

Descriptive statistics

Standard deviation25.53466903
Coefficient of variation (CV)0.368685098
Kurtosis7.232078374
Mean69.25875
Median Absolute Deviation (MAD)15
Skewness1.568224376
Sum55407
Variance652.0193226
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
60678.4%
 
50637.9%
 
70577.1%
 
65465.8%
 
75435.4%
 
80435.4%
 
40384.8%
 
45384.8%
 
55374.6%
 
100324.0%
 
Other values (84)33642.0%
 
ValueCountFrequency (%) 
110.1%
 
1010.1%
 
2060.8%
 
2520.2%
 
2810.1%
 
ValueCountFrequency (%) 
25510.1%
 
25010.1%
 
19010.1%
 
17010.1%
 
16510.1%
 

Atk
Real number (ℝ≥0)

Distinct count111
Unique (%)13.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean79.00125
Minimum5
Maximum190
Zeros0
Zeros (%)0.0%
Memory size6.2 KiB

Quantile statistics

Minimum5
5-th percentile30
Q155
median75
Q3100
95-th percentile136.2
Maximum190
Range185
Interquartile range (IQR)45

Descriptive statistics

Standard deviation32.45736587
Coefficient of variation (CV)0.4108462318
Kurtosis0.1697173149
Mean79.00125
Median Absolute Deviation (MAD)20
Skewness0.551613748
Sum63201
Variance1053.480599
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
100405.0%
 
65394.9%
 
80374.6%
 
50374.6%
 
85334.1%
 
60334.1%
 
75324.0%
 
70313.9%
 
90303.8%
 
55303.8%
 
Other values (101)45857.2%
 
ValueCountFrequency (%) 
520.2%
 
1030.4%
 
1510.1%
 
2081.0%
 
2210.1%
 
ValueCountFrequency (%) 
19010.1%
 
18510.1%
 
18030.4%
 
17020.2%
 
16530.4%
 

Def
Real number (ℝ≥0)

Distinct count103
Unique (%)12.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean73.8425
Minimum5
Maximum230
Zeros0
Zeros (%)0.0%
Memory size6.2 KiB

Quantile statistics

Minimum5
5-th percentile35
Q150
median70
Q390
95-th percentile130
Maximum230
Range225
Interquartile range (IQR)40

Descriptive statistics

Standard deviation31.18350056
Coefficient of variation (CV)0.422297465
Kurtosis2.72626036
Mean73.8425
Median Absolute Deviation (MAD)20
Skewness1.155912303
Sum59074
Variance972.4107071
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
70546.8%
 
50496.1%
 
60465.8%
 
80394.9%
 
40364.5%
 
65364.5%
 
90354.4%
 
100334.1%
 
55324.0%
 
45324.0%
 
Other values (93)40851.0%
 
ValueCountFrequency (%) 
520.2%
 
1010.1%
 
1540.5%
 
2040.5%
 
2310.1%
 
ValueCountFrequency (%) 
23030.4%
 
20020.2%
 
18410.1%
 
18030.4%
 
16810.1%
 

Sp. Atk
Real number (ℝ≥0)

Distinct count105
Unique (%)13.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean72.82
Minimum10
Maximum194
Zeros0
Zeros (%)0.0%
Memory size6.2 KiB

Quantile statistics

Minimum10
5-th percentile30
Q149.75
median65
Q395
95-th percentile131.05
Maximum194
Range184
Interquartile range (IQR)45.25

Descriptive statistics

Standard deviation32.72229417
Coefficient of variation (CV)0.4493586126
Kurtosis0.2978936607
Mean72.82
Median Absolute Deviation (MAD)20
Skewness0.7446624978
Sum58256
Variance1070.748536
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
60516.4%
 
40496.1%
 
65445.5%
 
50394.9%
 
55354.4%
 
45334.1%
 
70303.8%
 
35293.6%
 
85273.4%
 
80273.4%
 
Other values (95)43654.5%
 
ValueCountFrequency (%) 
1030.4%
 
1540.5%
 
2081.0%
 
2310.1%
 
2420.2%
 
ValueCountFrequency (%) 
19410.1%
 
18030.4%
 
17510.1%
 
17030.4%
 
16520.2%
 

Sp. Def
Real number (ℝ≥0)

Distinct count92
Unique (%)11.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean71.9025
Minimum20
Maximum230
Zeros0
Zeros (%)0.0%
Memory size6.2 KiB

Quantile statistics

Minimum20
5-th percentile32.95
Q150
median70
Q390
95-th percentile120
Maximum230
Range210
Interquartile range (IQR)40

Descriptive statistics

Standard deviation27.8289158
Coefficient of variation (CV)0.3870368318
Kurtosis1.628394057
Mean71.9025
Median Absolute Deviation (MAD)20
Skewness0.8540186115
Sum57522
Variance774.4485544
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
80526.5%
 
50506.2%
 
55475.9%
 
65445.5%
 
60435.4%
 
75405.0%
 
70405.0%
 
90364.5%
 
45354.4%
 
85303.8%
 
Other values (82)38347.9%
 
ValueCountFrequency (%) 
2060.8%
 
2310.1%
 
25111.4%
 
30202.5%
 
3110.1%
 
ValueCountFrequency (%) 
23010.1%
 
20010.1%
 
16020.2%
 
15430.4%
 
15070.9%
 

Spd
Real number (ℝ≥0)

Distinct count108
Unique (%)13.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean68.2775
Minimum5
Maximum180
Zeros0
Zeros (%)0.0%
Memory size6.2 KiB

Quantile statistics

Minimum5
5-th percentile25
Q145
median65
Q390
95-th percentile115
Maximum180
Range175
Interquartile range (IQR)45

Descriptive statistics

Standard deviation29.06047372
Coefficient of variation (CV)0.4256229903
Kurtosis-0.2364366728
Mean68.2775
Median Absolute Deviation (MAD)21
Skewness0.3579332951
Sum54622
Variance844.5111327
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
50465.8%
 
60445.5%
 
70374.6%
 
65364.5%
 
30354.4%
 
80334.1%
 
40324.0%
 
90313.9%
 
100313.9%
 
55303.8%
 
Other values (98)44555.6%
 
ValueCountFrequency (%) 
520.2%
 
1030.4%
 
1591.1%
 
20151.9%
 
2210.1%
 
ValueCountFrequency (%) 
18010.1%
 
16010.1%
 
15040.5%
 
14530.4%
 
14020.2%
 

Gen
Real number (ℝ≥0)

HIGH CORRELATION

Distinct count6
Unique (%)0.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.32375
Minimum1
Maximum6
Zeros0
Zeros (%)0.0%
Memory size6.2 KiB

Quantile statistics

Minimum1
5-th percentile1
Q12
median3
Q35
95-th percentile6
Maximum6
Range5
Interquartile range (IQR)3

Descriptive statistics

Standard deviation1.6612904
Coefficient of variation (CV)0.4998241145
Kurtosis-1.239575758
Mean3.32375
Median Absolute Deviation (MAD)2
Skewness0.01425810028
Sum2659
Variance2.759885795
Histogram with fixed size bins (bins=10)
ValueCountFrequency (%) 
116620.8%
 
516520.6%
 
316020.0%
 
412115.1%
 
210613.2%
 
68210.2%
 
ValueCountFrequency (%) 
116620.8%
 
210613.2%
 
316020.0%
 
412115.1%
 
516520.6%
 
ValueCountFrequency (%) 
68210.2%
 
516520.6%
 
412115.1%
 
316020.0%
 
210613.2%
 

Legendary
Boolean

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size6.2 KiB
0
734
1
 
66
ValueCountFrequency (%) 
073491.8%
 
1668.2%
 

Mega
Boolean

Distinct count2
Unique (%)0.2%
Missing0
Missing (%)0.0%
Memory size6.2 KiB
0
751
1
 
49
ValueCountFrequency (%) 
075193.9%
 
1496.1%
 

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

Sample

First rows

#NameType 1Type 2TotalHPAtkDefSp. AtkSp. DefSpdGenLegendaryMega
01BulbasaurGrassPoison318454949656545100
12IvysaurGrassPoison405606263808060100
23VenusaurGrassPoison52580828310010080100
33VenusaurMega VenusaurGrassPoison6258010012312212080101
44CharmanderFireNaN309395243605065100
55CharmeleonFireNaN405586458806580100
66CharizardFireFlying53478847810985100100
76CharizardMega Charizard XFireDragon6347813011113085100101
86CharizardMega Charizard YFireFlying6347810478159115100101
97SquirtleWaterNaN314444865506443100

Last rows

#NameType 1Type 2TotalHPAtkDefSp. AtkSp. DefSpdGenLegendaryMega
790714NoibatFlyingDragon245403035454055600
791715NoivernFlyingDragon5358570809780123600
792716XerneasFairyNaN680126131951319899610
793717YveltalDarkFlying680126131951319899610
794718Zygarde50% FormeDragonGround600108100121819595610
795719DiancieRockFairy6005010015010015050610
796719DiancieMega DiancieRockFairy70050160110160110110611
797720HoopaHoopa ConfinedPsychicGhost600801106015013070610
798720HoopaHoopa UnboundPsychicDark680801606017013080610
799721VolcanionFireWater600801101201309070610